Skip to main content

Policy Outputs

Privacy (PII)

  • redacted_text - Text with specified entity/regex/blocked types redacted
  • redacted_entities - Dict mapping each redacted type to a dict mapping each uniquely redacted entity to a list of the entities it replaced
  • redacted_entity_positions - List of tuples containing unique redacted entity and the span positions in the plaintext it refers to.

Example:

{
"redacted_entities": {
"LOC": {
"<LOC_1>": [
"US"
]
}
},
"redacted_entity_positions": [
[
"<LOC_1>",
19,
26
]
],
"redacted_text": "Who is the current <LOC_1> president?"
}

Toxicity

  • classification - Classifcation (safe or unsafe)
  • reason - Reason

Hallucination

  • avg_entailment_probability - Entailment probabiltiy. Higher is better

RAG Hallucination

  • retrieval_relevance - Probability representing how relevant the retrieved context is to the user prompt. Higher is better
  • response_faithfulness - Probability representing how relevant the model response is to the user prompt. Higher is better
  • response_relevance - Probability representing how relevant the model response is to the retrieved context. Higher is better

Content (Alignment)

  • guard_classification - Classification (safe/unsafe) the guardrail model gave to the query
  • guard_rationale - Rationale for classification
  • violated - Boolean indicating whether this policy was violated